ABSP: an automated R tool to efficiently analyze region-specific CpG methylation from bisulfite sequencing PCR

Abstract Motivation Nowadays, epigenetic gene regulations are studied in each part of the biology, from embryonic development to diseases such as cancers and neurodegenerative disorders. Currently, to quantify and compare CpG methylation levels of a specific region of interest, the most accessible technique is the bisulfite sequencing PCR (BSP). However, no existing user-friendly tool is able to analyze data from all approaches of BSP. Therefore, the most convenient way to process results from the direct sequencing of PCR products (direct-BSP) is to manually analyze the chromatogram traces, which is a repetitive and prone to error task. Results Here, we implement a new R-based tool, called ABSP for analysis of bisulfite sequencing PCR, providing a complete analytic process of both direct-BSP and cloning-BSP data. It uses the raw sequencing trace files (.ab1) as input to compute and compare CpG methylation percentages. It is fully automated and includes a user-friendly interface as a built-in R shiny app, quality control steps and generates publication-ready graphics. Availability and implementation The ABSP tool and associated data are available on GitHub at https://github.com/ABSP-methylation-tool/ABSP. Supplementary information Supplementary data are available at Bioinformatics online.


Supplementary Information
Two DNA samples, high-methylated and low-methylated human genomic DNA (80-8061-HGHM5 and 80-8062-HGUM5 from EpigenDx), were treated with sodium bisulfite. 1.4 μg of each DNA sample was mixed with 0.3 M of NaOH and incubated at 50°C for 20 min. Then, DNA solutions were treated with a 2.5 M of sodium bisulfite / 125 mM of hydroquinone pH 5.0 solution, at 70°C for 3 h.
The single-stranded bisulfite converted DNA was then cleaned up using the NucleoSpin Gel and PCR Clean-up kit (Macherey-Nagel) following the manufacturer's instructions and converted DNA samples were stored at -80°C before PCR amplification.
An upstream promoter region of the CDH1 gene was amplified using a touchdown PCR protocol and directly sequenced in both directions.
The touchdown PCR protocol was composed of 50 cycles of: 20 s at 95°C, 30 s at annealing temperature, and 2 min at 72°C. The annealing temperature varies from 60°C for 10 cycles, to 59°C, 58°C, 57°C, and 56°C for 1 cycle each, and 55°C for 36 cycles.
The CDH1 amplified region is located on the plus strand at coordinates chr16:68771007-68771227 (reference human genome hg19) (221 bp) and covers 17 CpG sites. With the addition of the primers 5'tails T3 (20 bp) and BGH Reverse (18 bp), the length of the amplicon is 259 bp.
Amplicon sequence (bisulfite converted sequence with CpG sites considered as methylated highlighted in grey and standard primers T3 and BGH Reverse underlined): 5'-AATTAACCCTCACTAAAGGGTTTAGTAATTTTAGGTTAGAGGGTTATCGCGTTTATG CGAGGTCGGGTGGGCGGGTCGTTAGTTTCGTTTTGGGGAGGGGTTCGCGTTGTTGATTG GTTGTGGTCGGTAGGTGAATTTTTAGTTAATTAGCGGTACGGGGGGCGGTGTTTTCGGG GTTTATTTGGTTGTAGTTACGTATTTTTTTTTAGTGGCGTCGGAATTGTAAAGTATTTGT GAGTTTCCTCGACTGTGCCTTCTA-3' The ABSP analysis of results was performed using as a reference DNA sequence the plus strand genomic sequence at coordinates chr16:68770940-68771280 (reference human genome hg19): 5'-CCACCGGCGGGGCTGGGATTCGAACCCAGTGGAATCAGAACCGTGCAGGTCCCATA ACCCACCTAGACCCTAGCAACTCCAGGCTAGAGGGTCACCGCGTCTATGCGAGGCCGG GTGGGCGGGCCGTCAGCTCCGCCCTGGGGAGGGGTCCGCGCTGCTGATTGGCTGTGGC CGGCAGGTGAACCCTCAGCCAATCAGCGGTACGGGGGGCGGTGCCTCCGGGGCTCACC TGGCTGCAGCCACGCACCCCCTCTCAGTGGCGTCGGAACTGCAAAGCACCTGTGAGCT TGCGGAAGTCAGTTCAGACTCCAGCCCGCTCCAGCCCGGCCCGACCCGACCGC-3' 2 Supplementary Figures   Fig. S1 Genomic plot of CDH1 methylation levels in the high-methylated DNA 3 sample from the individual analysis. Methylation percentages are displayed as a linear grey gradient along the genomic sequence.

Fig. S3
Boxplots of CDH1 methylation distribution between groups for each CpG position. Some data points are missing in the low-methylated DNA samples for the last 4 CpG sites.The Student's T-test p-values are displayed at CpG sites for which the T-test could be performed on the data. The original figure was rearranged to fit to page.

Fig. S4
Example of a trace data file resulting from an ESME analysis. A direct-BSP experiment was conducted on a region in the EPHB2 gene, located at coordinates chr1:23176079-23176408 (hg19). The sequencing run from the reverse primer was analyzed by the ESME software. The file name contains the "C1" mention to indicate the use of a C-rich (reverse) primer, designed from the plus/top strand (1) of genomic DNA. Hence the trace data are representing the reverse complement of the original sequencing result. The first chromatogram trace corresponds to the orignal trace without normalization and the second to the normalized trace. CpG sites are higlighted in yellow and methylation percentages calculated from traces are indicated below each one of them. Tab. S1 Comparison of methylation percentages obtained from both the ESME and ABSP analysis on the 5 CpG sites within the chr1:23176079-23176408 (hg19) region located in the EPHB2 gene.